-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests_gaudi: Added L2 vllm workload #329
Conversation
tests/gaudi/l2/README.md
Outdated
@@ -74,4 +74,83 @@ Welcome to HCCL demo | |||
[BENCHMARK] NW Bandwidth : 258.209121 GB/s | |||
[BENCHMARK] Algo Bandwidth : 147.548069 GB/s | |||
#################################################################################################### | |||
``` | |||
|
|||
## VLLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM
tests/gaudi/l2/README.md
Outdated
``` | ||
|
||
## VLLM | ||
VLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi VLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM
Build the workload container image: | ||
``` | ||
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_buildconfig.yaml | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the instruction to let user know whether the building is success. :-)
``` | ||
Deploy the workload: | ||
* Update the hugging face token and the pvc according to your cluster setup | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have some detail about setting the hugging face token? and also give some brief introduction about what model we are using. :-)
tests/gaudi/l2/vllm_buildconfig.yaml
Outdated
runPolicy: "Serial" | ||
source: | ||
git: | ||
uri: https://github.com/opendatahub-io/vllm.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After comparing
- https://github.com/opendatahub-io/vllm.git - ODH fork vllM
- https://github.com/vllm-project/vllm - vLLM upstream
3.https://github.com/HabanaAI/vllm-fork - Habana fork vLLM
I think currently we should start from use the 3. with the change in 1 (adding the ubi based docker file for RH OpenShift), and obviously the Intel are upstreaming from 3 to 2. So in the long run we will using 2.
So I think we need to 1). submit a PR to adding the ubi based docker file for RH, and also add the RH 9.4 support into the documents, and then 2). using repo 3 3) I think the owner of 3 will also help to upstream the ubi based docker file and doc to 2. 4) after that we can switch to use 2 the upstream vLLM.
@vbedida79 any comments? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HabanaAI/vllm-fork#190 PR for ubi from RH when merged into vllm gaudi fork repo, we can use that directly. Currently we can use the ubi image by RH maintained in https://github.com/opendatahub-io/vllm.git from https://github.com/HabanaAI/vllm-fork
c7d75b9
to
07ec92f
Compare
Updated according to comments, please review. thanks |
46ef40e
to
462c42d
Compare
@@ -75,3 +75,104 @@ Welcome to HCCL demo | |||
[BENCHMARK] Algo Bandwidth : 147.548069 GB/s | |||
#################################################################################################### | |||
``` | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR git comments: the buildconfig base on HabanaAI/vllm-fork#602
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated in PR and git commit
Build the workload container image: | ||
``` | ||
git clone https://github.com/opendatahub-io/vllm.git --branch gaudi-main | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use 1.18.0 branch ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to v1.18.0 from https://github.com/HabanaAI/vllm-fork/tree/v1.18.0 repo
tests/gaudi/l2/vllm_deployment.yaml
Outdated
- containerPort: 8000 | ||
resources: | ||
limits: | ||
habana.ai/gaudi: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we check and confirm how many Accelerators are actually used by vLLM?
I suggest to start from using only single Accelerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can check for 1 resource and update
5f35eab
to
4e68146
Compare
vllm gaudi ubi image based on PR HabanaAI/vllm-fork#602 Signed-off-by: vbedida79 <[email protected]>
4e68146
to
dd2a16c
Compare
PR includes gaudi l2 vllm workload
Signed-off-by: vbedida79 [email protected]